Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner Refactor #3240

Merged
merged 22 commits into from
Sep 29, 2024
Merged

Scanner Refactor #3240

merged 22 commits into from
Sep 29, 2024

Conversation

majora2007
Copy link
Member

@majora2007 majora2007 commented Sep 28, 2024

This is a canary release to test out the new scanner. These scanner enhancements are part of an effort to bring more reliability into the scanner and have ingestion of changes after the first scan to be faster. This new code is likely to make the first scan take longer, but with the scope of what Kavita does, it's acceptable for a first scan to take some time.

For those that are testing this, please report back to me on scan time for first scan (vs size of library), if ingestion is fast after the fact and if files are being picked up and scanned correctly (ie, updated a nested directory, you should only see that directory scanned).

Do not try to use this for your main server. I make no guarantees that it will work exactly (although from spot checking against my prod library, it's looking pretty good)

Changed

  • Changed: Optimized a number of methods within the Scanner to reduce memory and CPU time
  • Changed: Scanner can now choose to parallel parse files when there are over 100 in a directory
  • Changed: Changed how detection and scanning of dirty directories works. The scanner will now parse bottom-up to reduce any potential misses and avoid different layouts working differently. This has extra I/O checks but much greater reliability and should reduce the amount of work needed to ingest changes after the first scan.
  • Changed: LocalizedSeries merging with Series is now done at a higher level and performs much better (in terms of reliability)
  • Changed: The scanner will now pre-save all new Genres/Tags before processing Series to avoid any FK issues.

Fixed

  • Fixed: Fixed a bug where Series Cover image could choose a Volume 0 instead of Volume 1
  • Fixed: Fixed a bug where calculating the Lowest Folder for a Series could be incorrect when there are nested folders. This includes a migration to clear out existing entries to avoid scan series not seeing files.
    Example:
    / <- Lib root assume
    /love hina/love hina/v01.cbz
    /love hina/specials/sp01.cbz

The lowest series folder was /love hina/love hina/ for some reason, meaning series scan wasn't getting the full series (all folders) scanned in.

Optimized GetFileWithCertainExtensions to use yield and reduce memory overhead.
…otChangedSinceLastScan() says nothing has changed when a directory underneath has been deleted/renamed/etc thus missing out on critical work.

I need a new strategy to handle this case.
… files. This avoids the issues with OS not reporting good data and us having to over scan.

Specials don't group, so I need to look into that.
…cate scans on image libraries (like the last write doesn't work).

I noticed some weird parsing with Special directories, so I excluded them a bit.

Some subdirectories will always report dirty, usually image libraries with subfolders or weird layout. Not much I can do.
…nga and comicvine type layouts. Code is also much easier to understand.
…1 was already scanned.

There is only an issue with image layouts where chapters are being replaced. Likely due to not merging in TrackSeries.
…f chapters due to the new scanning mechanism.
/ <- Lib root assume
/love hina/love hina/v01.cbz
/love hina/specials/sp01.cbz

The lowest series folder was /love hina/love hina/ for some reason, meaning series scan wasn't getting the full series (all folders) scanned in

I'm not sure if I'm still on a good track or not with the scanner loop. It works on some stuff and others not well. But I am finding bugs in other pieces of code, so I will continue on this path to hopefully cleanup more of the scanner.
…an parser info by parser info. Due to the scanner processing incomplete changes now, we need to shift it up.
@majora2007 majora2007 added enhancement New feature or request Scanner Anything involving main Scan Loop labels Sep 28, 2024
@majora2007 majora2007 merged commit a792ad8 into canary Sep 29, 2024
3 checks passed
@majora2007 majora2007 deleted the bugfix/scanner-optimization-rewrite branch September 29, 2024 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Scanner Anything involving main Scan Loop
Projects
Development

Successfully merging this pull request may close these issues.

1 participant